23 research outputs found
Efficient moving point handling for incremental 3D manifold reconstruction
As incremental Structure from Motion algorithms become effective, a good
sparse point cloud representing the map of the scene becomes available
frame-by-frame. From the 3D Delaunay triangulation of these points,
state-of-the-art algorithms build a manifold rough model of the scene. These
algorithms integrate incrementally new points to the 3D reconstruction only if
their position estimate does not change. Indeed, whenever a point moves in a 3D
Delaunay triangulation, for instance because its estimation gets refined, a set
of tetrahedra have to be removed and replaced with new ones to maintain the
Delaunay property; the management of the manifold reconstruction becomes thus
complex and it entails a potentially big overhead. In this paper we investigate
different approaches and we propose an efficient policy to deal with moving
points in the manifold estimation process. We tested our approach with four
sequences of the KITTI dataset and we show the effectiveness of our proposal in
comparison with state-of-the-art approaches.Comment: Accepted in International Conference on Image Analysis and Processing
(ICIAP 2015
Mesh-based 3D Textured Urban Mapping
In the era of autonomous driving, urban mapping represents a core step to let
vehicles interact with the urban context. Successful mapping algorithms have
been proposed in the last decade building the map leveraging on data from a
single sensor. The focus of the system presented in this paper is twofold: the
joint estimation of a 3D map from lidar data and images, based on a 3D mesh,
and its texturing. Indeed, even if most surveying vehicles for mapping are
endowed by cameras and lidar, existing mapping algorithms usually rely on
either images or lidar data; moreover both image-based and lidar-based systems
often represent the map as a point cloud, while a continuous textured mesh
representation would be useful for visualization and navigation purposes. In
the proposed framework, we join the accuracy of the 3D lidar data, and the
dense information and appearance carried by the images, in estimating a
visibility consistent map upon the lidar measurements, and refining it
photometrically through the acquired images. We evaluate the proposed framework
against the KITTI dataset and we show the performance improvement with respect
to two state of the art urban mapping algorithms, and two widely used surface
reconstruction algorithms in Computer Graphics.Comment: accepted at iros 201
Multi-View Stereo with Single-View Semantic Mesh Refinement
While 3D reconstruction is a well-established and widely explored research
topic, semantic 3D reconstruction has only recently witnessed an increasing
share of attention from the Computer Vision community. Semantic annotations
allow in fact to enforce strong class-dependent priors, as planarity for ground
and walls, which can be exploited to refine the reconstruction often resulting
in non-trivial performance improvements. State-of-the art methods propose
volumetric approaches to fuse RGB image data with semantic labels; even if
successful, they do not scale well and fail to output high resolution meshes.
In this paper we propose a novel method to refine both the geometry and the
semantic labeling of a given mesh. We refine the mesh geometry by applying a
variational method that optimizes a composite energy made of a state-of-the-art
pairwise photo-metric term and a single-view term that models the semantic
consistency between the labels of the 3D mesh and those of the segmented
images. We also update the semantic labeling through a novel Markov Random
Field (MRF) formulation that, together with the classical data and smoothness
terms, takes into account class-specific priors estimated directly from the
annotated mesh. This is in contrast to state-of-the-art methods that are
typically based on handcrafted or learned priors. We are the first, jointly
with the very recent and seminal work of [M. Blaha et al arXiv:1706.08336,
2017], to propose the use of semantics inside a mesh refinement framework.
Differently from [M. Blaha et al arXiv:1706.08336, 2017], which adopts a more
classical pairwise comparison to estimate the flow of the mesh, we apply a
single-view comparison between the semantically annotated image and the current
3D mesh labels; this improves the robustness in case of noisy segmentations.Comment: {\pounds}D Reconstruction Meets Semantic, ICCV worksho
Attention Mechanisms for Object Recognition with Event-Based Cameras
Event-based cameras are neuromorphic sensors capable of efficiently encoding
visual information in the form of sparse sequences of events. Being
biologically inspired, they are commonly used to exploit some of the
computational and power consumption benefits of biological vision. In this
paper we focus on a specific feature of vision: visual attention. We propose
two attentive models for event based vision: an algorithm that tracks events
activity within the field of view to locate regions of interest and a
fully-differentiable attention procedure based on DRAW neural model. We
highlight the strengths and weaknesses of the proposed methods on four
datasets, the Shifted N-MNIST, Shifted MNIST-DVS, CIFAR10-DVS and N-Caltech101
collections, using the Phased LSTM recognition network as a baseline reference
model obtaining improvements in terms of both translation and scale invariance.Comment: WACV2019 camera-ready submissio
Facetwise Mesh Refinement for Multi-View Stereo
Mesh refinement is a fundamental step for accurate Multi-View Stereo. It
modifies the geometry of an initial manifold mesh to minimize the photometric
error induced in a set of camera pairs. This initial mesh is usually the output
of volumetric 3D reconstruction based on min-cut over Delaunay Triangulations.
Such methods produce a significant amount of non-manifold vertices, therefore
they require a vertex split step to explicitly repair them. In this paper, we
extend this method to preemptively fix the non-manifold vertices by reasoning
directly on the Delaunay Triangulation and avoid most vertex splits. The main
contribution of this paper addresses the problem of choosing the camera pairs
adopted by the refinement process. We treat the problem as a mesh labeling
process, where each label corresponds to a camera pair. Differently from the
state-of-the-art methods, which use each camera pair to refine all the visible
parts of the mesh, we choose, for each facet, the best pair that enforces both
the overall visibility and coverage. The refinement step is applied for each
facet using only the camera pair selected. This facetwise refinement helps the
process to be applied in the most evenly way possible.Comment: Accepted as Oral ICPR202
TAPA-MVS: Textureless-Aware PAtchMatch Multi-View Stereo
One of the most successful approaches in Multi-View Stereo estimates a depth
map and a normal map for each view via PatchMatch-based optimization and fuses
them into a consistent 3D points cloud. This approach relies on
photo-consistency to evaluate the goodness of a depth estimate. It generally
produces very accurate results; however, the reconstructed model often lacks
completeness, especially in correspondence of broad untextured areas where the
photo-consistency metrics are unreliable. Assuming the untextured areas
piecewise planar, in this paper we generate novel PatchMatch hypotheses so to
expand reliable depth estimates in neighboring untextured regions. At the same
time, we modify the photo-consistency measure such to favor standard or novel
PatchMatch depth hypotheses depending on the textureness of the considered
area. We also propose a depth refinement step to filter wrong estimates and to
fill the gaps on both the depth maps and normal maps while preserving the
discontinuities. The effectiveness of our new methods has been tested against
several state of the art algorithms in the publicly available ETH3D dataset
containing a wide variety of high and low-resolution images
Robust Moving Objects Detection in Lidar Data Exploiting Visual Cues
Detecting moving objects in dynamic scenes from sequences of lidar scans is an important task in object tracking, mapping, localization, and navigation. Many works focus on changes detection in previously observed scenes, while a very limited amount of literature addresses moving objects detection. The state-of-the-art method exploits Dempster-Shafer Theory to evaluate the occupancy of a lidar scan and to discriminate points belonging to the static scene from moving ones. In this paper we improve both speed and accuracy of this method by discretizing the occupancy representation, and by removing false positives through visual cues. Many false positives lying on the ground plane are also removed thanks to a novel ground plane removal algorithm. Efficiency is improved through an octree indexing strategy. Experimental evaluation against the KITTI public dataset shows the effectiveness of our approach, both qualitatively and quantitatively with respect to the state- of-the-art
Background subtraction by combining Temporal and Spatio-Temporal histograms in the presence of camera movement
Background subtraction is the classical approach to differentiate moving objects in a scene from the static background when the camera is fixed. If the fixed camera assumption does not hold, a frame registration step is followed by the background subtraction. However, this registration step cannot perfectly compensate camera motion, thus errors like translations of pixels from their true registered position occur. In this paper, we overcome these errors with a simple, but effective background subtraction algorithm that combines Temporal and Spatio-Temporal approaches. The former models the temporal intensity distribution of each individual pixel. The latter classifies foreground and background pixels, taking into account the intensity distribution of each pixels' neighborhood. The experimental results show that our algorithm outperforms the state-of-the-art systems in the presence of jitter, in spite of its simplicity
Backward-Simulation Particle Smoother with a hybrid state for 3D vehicle trajectory, class and dimension simultaneous estimation
The estimation of the 3D trajectory, the class and the dimensions of a vehicle represents three relevant tasks for traffic monitoring. They are usually performed by separate sub-systems and only few existing algorithms cope with the three tasks at the same time. However, if these tasks are integrated, the trajectory estimation enforces the classification with temporal consistency, and at the same time, the estimation of the vehicle class and dimensions can be used to increase the trajectory estimate accuracy. In this work, we propose an algorithm to estimate the 3D trajectory, the class and the dimensions of vehicles simultaneously by means of a Backward-Simulation Particle Smoother whose state contains both continuous (vehicle pose and dimensions), and discrete (vehicle class) quantities. To integrate the class estimate in the Particle Smoother we model the class prediction as a Markov Chain. We performed experimental tests on both simulated and real datasets; they show that the pose and dimension estimation reaches centimeter-accuracy and the classification accuracy is higher than 95